![]() |
The Twofish Encryption Algorithm: A 128-Bit Block Cipher
by Bruce Schneier ; John Kelsey ; Doug Whiting ; David Wagner ; Chris Hall ; Niels Ferguson Wiley Computer Publishing, John Wiley & Sons, Inc. ISBN: 0471353817 Pub Date: 03/01/99 |
Previous | Table of Contents | Next |
In the design of Twofish, we tried to stress the following principles:
These principles were applied not only to the overall design of Twofish, but to the design of the S-boxes and the key schedule.
The goal of performance-driven design is to build and evaluate ciphers on the basis of performance [SW97]. The early post-DES cipher designs would often compete on the number of rounds in the cipher. The original FEAL paper [SM88], for example, discussed the benefits of a stronger round function and fewer rounds. Other cipher designs of the periodREDOC-II [CW91], LOKI89 [BPS90] and LOKI91 [BKPS93], IDEA [LM91, LMM91]only considered performance as an afterthought. Khufu/Khafre [Mer91] was the first published algorithm that explicitly used operations that were efficient on 32-bit microprocessors; SEAL [RC94, RC98] is a more recent example. RC2 [Riv97, KRRR98] and Jeroboam [CM98] were designed for 16-bit microprocessors, SOBER [Ros98] for 8-bit ones. Other, more recent designs, do not seem to take performance into account at all. Two 1997 designs, SPEED [Zhe97]1 and Zhu-Guo [ZG97], are significantly slower than alternatives that existed years previous.
1SPEED has been cryptanalyzed in [HKSW98, UTK98, HKR+98].
Arbitrary metrics, such as the number of rounds, are not good measures of performance. What is important is the ciphers speed: the number of clock cycles per byte encrypted. When ciphers are analyzed according to this property, the results can be surprising [SW97]. RC5 might have twice the number of rounds of DES,2 but since its round function is more than twice as fast as DES, RC5 is faster than DES on most microprocessors.
2Here we use the term round in the traditional sense: as it was defined by DES [NBS77] and has been used to describe Feistel-network ciphers ever since. The RC5 documentation [Riv95] uses the term round differently: one RC5-defined round equals two Feistel rounds.
Even when cryptographers made efforts to use efficient 32-bit operations, they often lacked a full appreciation of low-level software optimization principles associated with high-performance CPUs. Thus, many algorithms are not as efficient as they could be. Minor modifications in the design of Blowfish [Sch94], SEAL [RC94, RC98], and RC4 [Sch96] could improve performance without affecting security [SW97] (or, alternatively, increase the algorithms complexity without affecting performance). In designing Twofish, we tried to evaluate all design decisions in terms of performance.
Since NISTs platform of choice was the Intel Pentium Pro [NIST97b], we concentrated on that platform. However, we did not ignore performance on other 32-bit CPUs, as well as 8-bit and 16-bit CPUs. If there is any lesson from the past twenty years of microprocessors, it is that the high end gets better and the low end never goes away. Yesterdays top-of-the-line CPUs are currently in smart cards. Todays CPUs will eventually be in smart cards, while the 8-bit microprocessors will move to devices even smaller. The only thing we did not consider in our performance metrics is bitslice implementations [Bih97, SAM97, NM97], since these can only be used in very specialized applications and often require unrealistic implementations; e.g., 32 simultaneous ECB encryptions, or 32 interleaved IVs.3
3One AES submission, Serpent [BAK98], uses ideas from bitslice implementations to create a cipher that is optimized for 32-bit processors while sacrificing performance on 8-bit processors.
During our design, we constantly evaluated the relative performance of different modifications to our round function. Twofishs round function encrypts at about 20 clock cycles; 16 rounds translates to about 320 clock cycles per block encrypted. When we contemplated a change to the round function, we evaluated it in terms of increasing or decreasing the number of rounds to keep performance constant. For example:
This analysis is necessarily dependent on the microprocessor architecture the algorithm is being compared on. While we focused on the Intel Pentium architecture, we also tried to keep 8-bit smart card and hardware implementations in mind. For example, we considered using an 8-by-8 MDS matrix over GF(24) to ensure a finer-grained diffusion, instead of a 4-by-4 MDS matrix over GF(28); the former would have been no slower on a Pentium but at least twice as slow on a low-memory smart card.
In the last decade there has been considerable research in designing ciphers to be resistant to known attacks [Nyb91 Nyb93, OCo94a, OCo94b, OCo94c, Knu94a, Knu94b, Nyb94, DGV94b, Nyb95, NK95, Mat96, Nyb96], such as differential [BS93], linear [Mat94], and related-key cryptanalysis [WH87, Bih94, KSW96, KSW97]. This research has culminated in strong cipher designsCAST-128 [Ada97a] and MISTY [Mat97] are probably the most noteworthyas well as some excellent cryptanalytic theory.
Previous | Table of Contents | Next |